karen: support-first prompt + pre-purchase scenarios by deepmasq · Pull Request #317 · smallcloudai/flexus-client-kit

deepmasq · 2026-04-16T12:59:50Z

Summary

Fibery #2403 — Karen Pre-purchase answers and recommendations quality.

Restructure very_limited expert prompt: support-first, sales-assist only on buying intent
Extract C.L.O.S.E.R. + BANT to skills/sales-closer/SKILL.md (loaded on demand via flexus_fetch_skill)
Add 3 benchmark scenario YAMLs for pre-purchase conversations
Fallback in prompt if skill unavailable

Tournament Result

Produced via /tournament (N=2). Candidate A (Opus, conservative reorder) scored 7.65, Candidate B (Sonnet, skill extraction) scored 7.05. Judge recommended SYNTHESIZE: A's prompt structure + scenarios, B's skill extraction.

What	Source
Prompt structure (support/recommendations/sales-assist sections)	Candidate A
3 scenario YAMLs	Candidate A
`skills/sales-closer/SKILL.md`	Candidate B
Skill-loading trigger + fallback line	Synthesis

Files

File	Change
`karen_prompts.py`	Rewrite very_limited section (+29/-19)
`skills/sales-closer/SKILL.md`	New (24 lines) — C.L.O.S.E.R. + BANT on demand
`very_limited__pre_purchase_plan_comparison.yaml`	New scenario — team comparing Pro vs Enterprise
`very_limited__pre_purchase_just_browsing.yaml`	New scenario — pure support, no buying intent
`very_limited__pre_purchase_browsing_to_intent.yaml`	New scenario — support → buying intent transition

Benchmark Results (staging, v1.2.231)

Scores by model (actual_rating / 10)

Scenario	grok-4-1-fast-reasoning	claude-sonnet-4-6	gpt-5.4
`plan_comparison`	8	--	--
`saas_cs_platform_short` (regression)	8	--	--
`just_browsing`	6	5	8
`browsing_to_intent`	4	8	8

Key findings

gpt-5.4 passes all scenarios at 8/10 — best tool-following compliance (uses flexus_vector_search correctly, no fabrication, no unnecessary follow-ups)
grok fabricates during sales transitions (hallucinated pricing, invented links) — known model behavior issue
claude-sonnet inconsistent — passes browsing_to_intent (8) but fails just_browsing (5) on tool selection
No regression on existing sales scenario (8/10 on grok)
Prompt iteration applied: v1 → v2 tightened "use flexus_vector_search not product_catalog", removed follow-up loophole, added anti-fabrication guardrail

Model recommendation

Karen's very_limited expert should use gpt-5.4 for customer-facing conversations. It follows support-first instructions and tool constraints most reliably. grok-4-1-fast-reasoning remains fine for the default (admin) expert where sales compliance is less critical.

Test plan

Run 3 new scenarios against staging (grok): plan_comparison 8, just_browsing 6, browsing_to_intent 4
Run regression scenario (grok): saas_cs_platform_short 8 — no regression
Re-run failing scenarios with claude-sonnet: just_browsing 5, browsing_to_intent 8
Re-run failing scenarios with gpt-5.4: just_browsing 8, browsing_to_intent 8
Switch very_limited expert model to gpt-5.4 (separate change)

🤖 Generated with Claude Code

…e scenarios tournament synthesis: A's prompt structure (support-first, explicit intent signals, strong anti-interrogation) + B's skill extraction (sales-closer loaded on demand via flexus_fetch_skill) + fallback if skill unavailable. 3 new benchmark scenarios: plan comparison, just browsing, browsing-to-intent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

backslash-quote inside single-quoted yaml string broke parser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rication fixes 2 failing scenarios: - just_browsing: karen used product_catalog instead of flexus_vector_search, asked follow-ups in pure support mode - browsing_to_intent: fabricated links not grounded in KB Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…io, clean judge_instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Humberto feedback: don't tell the model "NEVER do X" when X was never instructed. Removed "NEVER interrogate", "Don't upsell", "NEVER push for a decision / manufacture urgency". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

humbertoyusta · 2026-04-20T17:02:17Z

+- Ground every recommendation in flexus_vector_search() results. No invented features.
+- If they say "I'll think about it" or "let me check with my team" — that's a valid outcome. Offer to help later, resolve.
+
+## Sales-Assist (Only on Buying Intent)


If we're splitting, and it's support bot mostly, do we really need to do the separation of support mode default and sales assist?

the only sales-only part of prompt is

When you detect buying intent: listen 70% talk 30%, clarify their problem, paint the outcome not features, handle objections honestly, offer a human when stuck.

which, I think, it does not justify the if this then support, if this then sell, just telling to answer should be fine, in only one mode, which is mostly support

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

deepmasq · 2026-04-21T11:43:52Z

Superseded by #322 (sales extraction from Karen) which removed the Sales-Assist section, BANT, and C.L.O.S.E.R. entirely. The pre-purchase scenarios from this PR should be re-added separately if needed.

Art Koval and others added 3 commits April 16, 2026 15:59

fix yaml parse error in just_browsing scenario

731de4c

backslash-quote inside single-quoted yaml string broke parser. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

deepmasq requested a review from humbertoyusta April 17, 2026 10:15

deepmasq marked this pull request as ready for review April 17, 2026 10:15

Art Koval and others added 3 commits April 17, 2026 13:17

merge main: resolve VERSION conflict (keep 1.2.231)

ba74795

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

remove sales-closer skill, inline guidance, drop just_browsing scenar…

7c2c7dc

…io, clean judge_instructions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

humbertoyusta approved these changes Apr 17, 2026

View reviewed changes

humbertoyusta reviewed Apr 20, 2026

View reviewed changes

Art Koval and others added 2 commits April 21, 2026 11:58

remove sales-assist section from very_limited — one mode, support only

64403a0

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

merge main: resolve VERSION conflict (keep 1.2.252)

76b4faa

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

deepmasq closed this Apr 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

karen: support-first prompt + pre-purchase scenarios#317

karen: support-first prompt + pre-purchase scenarios#317
deepmasq wants to merge 8 commits intomainfrom
feat/karen-pre-purchase-quality

deepmasq commented Apr 16, 2026 •

edited

Loading

Uh oh!

humbertoyusta Apr 20, 2026

Uh oh!

deepmasq commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

deepmasq commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tournament Result

Files

Benchmark Results (staging, v1.2.231)

Scores by model (actual_rating / 10)

Key findings

Model recommendation

Test plan

Uh oh!

humbertoyusta Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

deepmasq commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

deepmasq commented Apr 16, 2026 •

edited

Loading